Scope completion truncation to active provider#49
Conversation
📝 WalkthroughWalkthroughThis PR refactors prompt truncation in the OpenAI completion flow to use only the selected model's token limits instead of applying the most restrictive limit across all configured providers. The truncation API now accepts a provider parameter, the internal logic is simplified to check only the resolved model, and callers delegate truncation responsibility to the request factory. ChangesProvider-Aware Prompt Truncation
🎯 2 (Simple) | ⏱️ ~12 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| public String truncatePromptForCompletion(String prompt) { | ||
| return truncatePromptForCompletion(prompt, RateLimitService.ApiProvider.OPENAI); | ||
| } |
There was a problem hiding this comment.
Dead public API after this refactor
The no-arg truncatePromptForCompletion(String prompt) no longer has any production caller — its only call site in OpenAIStreamingService.complete() was removed by this PR, and buildCompletionRequest now invokes the private truncatePromptForCompletion(String, String) directly. Keeping the method conflicts with [AB1d] ("Delete unused code instead of keeping it 'just in case'") and [RC1b] ("No compatibility shims that hide defects"). The same applies to the new two-arg public overload at line 138, which is also reachable only from tests — buildCompletionRequest bypasses it and calls the private method itself.
Consider removing both public overloads and testing via buildCompletionRequest directly, which exercises the full provider-to-model-id path, or make the provider-arg overload the single public entry point.
Context Used: AGENTS.md (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/main/java/com/williamcallahan/javachat/service/OpenAiRequestFactory.java
Line: 127-129
Comment:
**Dead public API after this refactor**
The no-arg `truncatePromptForCompletion(String prompt)` no longer has any production caller — its only call site in `OpenAIStreamingService.complete()` was removed by this PR, and `buildCompletionRequest` now invokes the private `truncatePromptForCompletion(String, String)` directly. Keeping the method conflicts with [AB1d] ("Delete unused code instead of keeping it 'just in case'") and [RC1b] ("No compatibility shims that hide defects"). The same applies to the new two-arg public overload at line 138, which is also reachable only from tests — `buildCompletionRequest` bypasses it and calls the private method itself.
Consider removing both public overloads and testing via `buildCompletionRequest` directly, which exercises the full provider-to-model-id path, or make the provider-arg overload the single public entry point.
**Context Used:** AGENTS.md ([source](https://app.greptile.com/review/custom-context?memory=c73518f6-94f2-4eb4-a597-3be5ff49a896))
How can I resolve this? If you propose a fix, please make it concise.| boolean gpt5Family = isGpt5Family(modelId); | ||
| boolean reasoningModel = gpt5Family || canonicalModelName(modelId).startsWith("o"); | ||
|
|
||
| int tokenLimit = reasoningModel ? MAX_TOKENS_GPT5_INPUT : MAX_TOKENS_DEFAULT_INPUT; |
There was a problem hiding this comment.
o-series models receive GPT-5's 7K token limit regardless of context window
canonicalModelName(modelId).startsWith("o") captures o1, o3, o3-mini, etc. and routes them to MAX_TOKENS_GPT5_INPUT (7 000 tokens). Many o-series models expose far larger context windows and this mismatch silently truncates prompts that would fit. This was also true before the PR, but the refactor now makes this path the single authoritative one for both providers, so the blast radius is wider. At minimum the assumption should be documented; at best, o-series models should have their own named constant and explicit limit.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/main/java/com/williamcallahan/javachat/service/OpenAiRequestFactory.java
Line: 149-152
Comment:
**`o`-series models receive GPT-5's 7K token limit regardless of context window**
`canonicalModelName(modelId).startsWith("o")` captures `o1`, `o3`, `o3-mini`, etc. and routes them to `MAX_TOKENS_GPT5_INPUT` (7 000 tokens). Many o-series models expose far larger context windows and this mismatch silently truncates prompts that would fit. This was also true before the PR, but the refactor now makes this path the single authoritative one for both providers, so the blast radius is wider. At minimum the assumption should be documented; at best, o-series models should have their own named constant and explicit limit.
How can I resolve this? If you propose a fix, please make it concise.
Summary
buildCompletionRequestso fallback providers use their own model limitsgpt-4oalongside the default GitHub Modelsopenai/gpt-5Fixes #40.
Verification
JAVA_HOME=C:\Users\Admin\AppData\Local\Codex\jdks\temurin-25\jdk-25.0.3+9 ./gradlew.bat test --tests com.williamcallahan.javachat.service.OpenAiRequestFactoryTestJAVA_HOME=C:\Users\Admin\AppData\Local\Codex\jdks\temurin-25\jdk-25.0.3+9 ./gradlew.bat test --tests com.williamcallahan.javachat.service.OpenAIStreamingServiceTestJAVA_HOME=C:\Users\Admin\AppData\Local\Codex\jdks\temurin-25\jdk-25.0.3+9 ./gradlew.bat spotlessCheckgit diff --checkFull
./gradlew.bat testruns 244 tests but fails on the existing Windows environment becauseEnvironmentVariablePrecedenceTesthardcodes/bin/bash, which is not present here.Greptile Summary
This PR fixes issue #40 by moving prompt truncation inside
buildCompletionRequestso that each provider attempt uses its own model's token limits, rather than applying a single pre-loop truncation derived from the union of both providers' characteristics.OpenAIStreamingService.complete()now passes the raw prompt tobuildCompletionRequest, which calls the privatetruncatePromptForCompletion(String, String)with the resolved model ID for the active provider.OpenAiRequestFactorygains a public provider-arg overload oftruncatePromptForCompletionand a private model-ID overload that contains the actual logic; the original no-arg public method now delegates to the OPENAI provider by default.gpt-4o(OpenAI) leaves an ~8K-token prompt untouched whileopenai/gpt-5(GitHub Models) truncates it with the appropriate notice.Confidence Score: 4/5
Safe to merge; the core logic change is correct and well-tested for the targeted scenarios.
The refactor correctly scopes truncation to the active provider on each attempt. Two dead public overloads have no production callers after the change, and the o-series 7K limit assumption silently under-serves larger-context models. Neither issue affects correctness for the currently configured models.
OpenAiRequestFactory.java — the dead public overloads and the o-series token limit assumption are worth a second look before this code path grows further.
Important Files Changed
Sequence Diagram
sequenceDiagram participant C as Caller participant S as OpenAIStreamingService participant F as OpenAiRequestFactory participant P as ProviderRoutingService C->>S: complete(prompt, temperature) S->>P: selectAvailableProviderCandidates(...) P-->>S: [providerCandidate, ...] loop for each providerCandidate S->>F: buildCompletionRequest(prompt, temperature, activeProvider) Note over F: normalizedModelId(useGitHubModels) F->>F: truncatePromptForCompletion(prompt, modelId) Note over F: gpt5Family / reasoningModel check F-->>S: ResponseCreateParams (with truncated prompt) S->>P: client.responses().create(requestParameters) alt success P-->>S: Response S-->>C: Mono.just(text) else RuntimeException S->>P: recordProviderFailure(...) Note over S: fallback to next provider if eligible end endPrompt To Fix All With AI
Reviews (1): Last reviewed commit: "Scope completion truncation to active pr..." | Re-trigger Greptile
Context used: